NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Aligning LLM Agents by Learning Latent Preference from User Edits

Gao, Ge; Taymanov, Alexey; Salinas, Eduardo; Mineiro, Paul; Misra, Dipendra (December 2024, NeurIPS)

We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.
more » « less
Full Text Available
Understanding Contrastive Learning Requires Incorporating Inductive Biases

Saunshi, Nikunj; Ash, Jordan; Goel, Surbhi; Misra, Dipendra; Zhang, Cyril; Arora, Sanjeev; Kakade, Sham; Krishnamurthy, Akshay (January 2022, Proceedings of Machine Learning Research)

Full Text Available
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

https://doi.org/10.1109/CVPR.2019.01282

Chen, Howard; Suhr, Alane; Misra, Dipendra; Snavely, Noah; Artzi, Yoav (June 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction

Blukis, Valts; Misra, Dipendra; Knepper, Ross A.; Artzi, Yoav (January 2018, Proceedings of The 2nd Conference on Robot Learning)

We propose an approach for mapping natural language instructions and raw observations to continuous control of a quadcopter drone. Our model predicts interpretable position-visitation distributions indicating where the agent should go during execution and where it should stop, and uses the predicted distributions to select the actions to execute. This two-step model decomposition allows for simple and efficient training using a combination of supervised learning and imitation learning. We evaluate our approach with a realistic drone simulator, and demonstrate absolute task-completion accuracy improvements of 16.85% over two state-of-the-art instruction-following methods.
more » « less
Full Text Available
Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction

Misra, Dipendra; Bennett, Andrew; Blukis, Valts; Niklasson, Eyvind; Shatkhin, Max; Artzi, Yoav (January 2018, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing)

We propose to decompose instruction execution to goal prediction and action generation. We design a model that maps raw visual observations to goals using LINGUNET, a language-conditioned image generation network, and then generates the actions required to complete them. Our model is trained from demonstration only without external resources. To evaluate our approach, we introduce two benchmarks for instruction following: LANI, a navigation task; and CHAI, where an agent executes household instructions. Our evaluation demonstrates the advantages of our model decomposition, and illustrates the challenges posed by our new benchmarks.
more » « less
Full Text Available

Search for: All records